Statistics:Introduction/Need To Know
What do I Need to Know to Learn Statistics? Statistics is a diverse subject and thus the mathematics that are required depend on the kind of statistics we are studying. A strong background in linear algebra is needed for most multivariate statistics, but is not necessary for introductory statistics. A background in Calculus is useful no matter what branch of statistics is being studied, but is not required for most introductory statistics classes. At a bare minimum the student should have a grasp of basic concepts taught in Algebra and be comfortable with "moving things around" and solving for an unknown. Refresher Course Most of the statistics here is going to derive from a few basic things that the reader should become acquainted with. Absolute Value |x| \equiv \begin{cases} x, & x >= 0 \\ -x, & x < 0 \end{cases} If the number is positive, then the absolute value of the number is just the number. If the number is negative, then the absolute value is simply the positive form of the number. = Examples = *|-5| = 5 *|2.21| = 2.21 Factorials A factorial is a calculation that gets used a lot in probability. It is defined only for integers greater-than-or-equal-to zero as: n! \equiv \begin{cases} n \cdot (n-1)!, & n \ge 1 \\ 1, & n = 0 \end{cases} = Examples = In short, this means that: Summation The summation (also known as a series) is used more than almost any other technique in statistics. It is a method of representing addition over lots of values without putting + after +. We represent summation using an uppercase Sigma: ∑. \sum_{i=0}^n x_i = x_0 + x_1 + x_2 + \cdots + x_n Here we are simply adding the variables (which will hopefully all have values for by the time we are calculating this). The expression below the ∑ (i''=0, in this case) represents the variable and what its starting value is (''i with a starting value of 0) while the number above the ∑ represents the number that the variable will increment to (stepping by 1, so i'' = 0, 1, 2, 3, and then 4). = Examples = \sum_{i=1}^4 2i = 2(1) + 2(2) + 2(3) + 2(4) = 2 + 4 + 6 + 8 = 20 Notice that we would get the same value by moving the 2 outside of the summation (perform the summation and then multiply by 2, rather than multiplying each component of the summation by 2). Infinite Series There is no reason, of course, that a series has to count on any determined, or even finite value--it can keep going without end. These series are called "infinite series" and sometimes they can even converge to a finite value, eventually becoming equal to that value as the number of items in your series approaches infinity (∞). = Examples = This example is the famous geometric series. Note both that the series goes to ∞ (infinity, that means it does not stop) and that it is only valid for certain values of the variable ''r. This means that if r'' is between the values of -1 and 1 (-1 < ''r < 1) then the summation will get closer to (i.e., converge on) 1 / 1-''r'' the further you take the series out. Linear Approximation Let us say that you are looking at a table of values, such as the one above. You want to approximate (get a good estimate of) the values at 63, but you do not have those values on your table. A good solution here is use a linear approximation to get a value which is probably close to the one that you really want, without having to go through all of the trouble of calculating the extra step in the table. f\left(x_i\right) \approx \frac{f\left(x_{\lceil i \rceil}\right) - f\left(x_{\lfloor i \rfloor}\right)}{x_{\lceil i \rceil} - x_{\lfloor i \rfloor}} \cdot \left(x_i - x_{\lfloor i \rfloor}\right) + f\left(x_{\lfloor i \rfloor}\right) This is just the equation for a line applied to the table of data. x''i'' represents the data point you want to know about, x_{\lfloor i \rfloor} is the known data point beneath the one you want to know about, and x_{\lceil i \rceil} is the known data point above the one you want to know about. = Examples = Find the value at 63 for the 0.05 column, using the values on the table above. First we confirm on the above table that we need to approximate the value. If we know it exactly, then there really is no need to approximate it. As it stands this is going to rest on the table somewhere between 60 and 70. Everything else we can get from the table: f(63) \approx \frac{f(70) - f(60)}{70 - 60} \cdot (63 - 60) + f(60) = \frac{1.66691 - 1.67065}{10} \cdot 3 + 1.67065 = 1.669528 Using software, we calculate the actual value of f(63) to be 1.669402, a difference of around 0.00013. Close enough for our purposes. Statistics This material has been imported fom the wikibook "Statistics"[ http://en.wikibooks.org/wiki/Statistics]under the GNU Free Documentation